Data Visualization

A tour of favourite hits

Ayush Patel

Hello!

  • I am Ayush.
  • I work at the intersection of data, policy and development.
  • I sometimes teach data analysis skills using R to those who will suffer me.

What is the talk about?

I think the two must read books on data viz are Data Visualization by Kieran Healy and Fundamentals of Data Visualization by Claus Wilke.

  • I will walk through some interesting examples from these books.
  • We will look at motivations for data visualization
  • Markers of good and bad data visualization
  • Some common mistakes to avoid
  • Where and how to learn more

Can you tell me something?

Following is the number of time I day dream about food, recorded over two weeks:

 [1] 19 27 24 27 27 24 24 22 25 22 25 31 21 23

Here is the number of times I tried to complete this presentation over the same period of time:

 [1] 28 27 28 24 26 15 31 19 22 19 34 34 25 20

It is not easy and intuitive to look at number and say something about it. Even if summary stats are provided. Let us look at this more closely.

You all know of the quartet - Anscombe and friends

   x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89
# A tibble: 4 × 3
  variable  mean    sd
  <chr>    <dbl> <dbl>
1 y1        7.50  2.03
2 y2        7.50  2.03
3 y3        7.5   2.03
4 y4        7.50  2.03

Corr x1, y1: 0.8164205
Corr x1, y1: 0.8162365
Corr x1, y1: 0.8162867
Corr x1, y1: 0.8165214

Things are not always as they seem

Anscombe’s quartet-from Data Visualization by Healy

Clearly, looking at data helps

  • Helps the build intuitive understanding of the data
  • Identify patterns, sometimes expected, sometimes unexpected
  • Convey a lot of information in a concise an accessible and memorable manner
  • All the points are true for the people generating as well as consuming a visualization

Can we identify an effective vs a bad visualization

Napolean’s retreat from Russia by Minard-from Data Visualization by Healy

`Monstrous Costs’ by Nigel Holmes-from Data Visualization by Healy

Can we identify an effective vs a bad visualization

Rainfall in Glasgow and Edinbrugh-from Cara Thompson’s More than pretty graphs

Rainfall in Glasgow and Edinbrugh-from Cara Thompson’s More than pretty graphs

Tufte on Visualization

“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency. … [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space … [It] is nearly always multivariate … And graphical excellence requires telling the truth about the data. (Tufte, 1983, p. 51).”

Markers of a good visualization

  • Not just about how it looks, though this makes the graph memorable
  • Depends who is looking at it
  • Why are they looking at it - what is expected out of the chart
  • Essentially, you need both, good taste as well as an understanding of how human visual perception works. The latter can be learned with practice and in relatively less time than the first. Good taste needs to be developed and it takes time.

A discussion on this figure

reference to NYT graph-from Data Visualization by Healy

Identifying features of bad visualizations

The following problems are distinct but can appear in in various combinations in a given figure.

  • Strictly Aesthetic
  • Substantive - the data presented is somehow off
  • Perceptual

Back to Democracy - what are the good things?

reference to NYT graph-from Data Visualization by Healy

What if I tell you

  • These are are not the responses from a longitudinal survey
  • Actually, it is the same question asked to people born in different decades, i.e., different age groups.
  • Cherry on top it was not a binary question
  • Identifying substantive problems require understanding of underlying data of a chart, being observant of any transformations and consequent effects, etc..

A good samaritan

Voeten’s response to NYT graph-from Data Visualization by Healy

One for the finance bros

Liz Ann Sonders, Chief Investment Strategist with Charles Schwab, Inc,-from Data Visualization by Healy

What if it was shown this way

Healy’s examples for possible manipulations from Data Visualization by Healy

Healy’s examples for possible manipulations from Data Visualization by Healy

How to address such issues - Healy’s Alternative

Healy’s Alternative to the index vs money base chart from Data Visualization by Healy

Garden variety mistakes - Single variable distributions

Bin width example from Fundamentals of Data Visualization by Wilke

Garden variety mistakes - Single variable distributions

Incorrect data representation example from Fundamentals of Data Visualization by Wilke

Garden variety mistakes - two or more variable distributions

Multiple Distribution common error from Fundamentals of Data Visualization by Wilke

Multiple Distribution common error from Fundamentals of Data Visualization by Wilke

Alternatives from Wilke

for multi variable distribution

Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke

Alternatives from Wilke

for multi variable distribution

Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke

Alternatives from Wilke

for multi variable distribution

Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke

Alternatives from Wilke

for multi variable distribution

Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke

Garden Variety mistake - Unordered Barcharts

Resources -

Thank you.